Search results for " bioinformatics"
showing 10 items of 74 documents
Discriminating graph pattern mining from gene expression data
2016
We consider the problem of mining gene expression data in order to single out interesting features that characterize healthy/unhealthy samples of an input dataset. We present and approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt to build a different graph for each sample and, then, to have a database of graphs for representing a sample set. Out main goal is that of singling out interesting differences between healthy and unhealthy samples, through the extraction of "discriminating patterns" among graphs belonging to the two different sample sets. Differently from the …
kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.
2018
Abstract Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays—one for k-mer representation and the other for frequency encoding. Exper…
Toward completion of the Earth’s proteome: an update a decade later
2017
Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on i…
ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.
2015
Background: Cluster analysis is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. It is central to the life sciences due to the advent of high throughput technologies, e.g., classification of tumors. In particular, in cluster analysis, it is of relevance to assess cluster quality and to predict the number of clusters in a dataset, if any. This latter task is usually performed via internal validation measures. Despite their potentially important role, both the use of classic internal validation measures and the design of new ones, specific for microarray data, do not seem to have grea…
Biomolecular computers with multiple restriction enzymes
2017
Abstract The development of conventional, silicon-based computers has several limitations, including some related to the Heisenberg uncertainty principle and the von Neumann “bottleneck”. Biomolecular computers based on DNA and proteins are largely free of these disadvantages and, along with quantum computers, are reasonable alternatives to their conventional counterparts in some applications. The idea of a DNA computer proposed by Ehud Shapiro’s group at the Weizmann Institute of Science was developed using one restriction enzyme as hardware and DNA fragments (the transition molecules) as software and input/output signals. This computer represented a two-state two-symbol finite automaton t…
INVESTIGATION OF BIOTIC STRESS RESPONSES IN FRUIT TREE CROPS USING META-ANALYTICAL TECHNIQUES.
2020
In recent years, RNA sequencing and analysis using Next Generation Sequencing (NGS) methods have enabled to understand the gene expression pertaining to plant biotic and abiotic stress conditions in both quantitative and qualitative manner. The large number of transcriptomic works published in plants requires more meta-analysis studies that would identify common and specific features in relation of the high number of objective studies performed at different developmental and environmental conditions. Meta-analysis of transcriptomic data will identify commonalities and differences between differentially regulated gene lists and will allow screen which genes are key players in gene-gene and p…
Evaluation of HIV transmission clusters among natives and foreigners living in Italy
2020
We aimed at evaluating the characteristics of HIV-1 molecular transmission clusters (MTCs) among natives and migrants living in Italy, diagnosed between 1998 and 2018. Phylogenetic analyses were performed on HIV-1 polymerase (pol) sequences to characterise subtypes and identify MTCs, divided into small (SMTCs, 2&ndash
The Relationship Between Polygenic Risk Scores and Cognition in Schizophrenia
2020
Abstract Background Cognitive impairment is a clinically important feature of schizophrenia. Polygenic risk score (PRS) methods have demonstrated genetic overlap between schizophrenia, bipolar disorder (BD), major depressive disorder (MDD), educational attainment (EA), and IQ, but very few studies have examined associations between these PRS and cognitive phenotypes within schizophrenia cases. Methods We combined genetic and cognitive data in 3034 schizophrenia cases from 11 samples using the general intelligence factor g as the primary measure of cognition. We used linear regression to examine the association between cognition and PRS for EA, IQ, schizophrenia, BD, and MDD. The results wer…
Multi-omics analysis of epithelial-to mesenchymal transition mediators in breast cancer
2022
An Intelligent System for Decision Support in Bioinformatics
2011
The enormous array of computational techniques and data available due to today's use of high-throughput technologies can be quite overwhelming for researchers investigating biological problems. For any problem, there are many possible models and algorithms giving different results. We present a new Intelligent System that supports the selection, configuration and operation of strategies and tools in the bioinformatics domain. The Institute for High Performance Computing and Networking (ICAR-CNR) and the University of Palermo are developing an intelligent system that supports bioinformatics research. The system guides the researcher in building a data analysis workflow and acts as an interfa…